Search CORE

28 research outputs found

Second-order Democratic Aggregation

Author: A Cherian
F Perronnin
IL Dryden
J Carreira
J Sánchez
K Guo
L Sharan
N Murray
O Tuzel
O Tuzel
P Koniusz
P Koniusz
P Li
PA Knight
R Bhatia
R Bhatia
T Popoviciu
TY Lin
V Arsigny
X Pennec
Publication venue
Publication date: 22/08/2018
Field of study

Aggregated second-order features extracted from deep convolutional networks have been shown to be effective for texture generation, fine-grained recognition, material classification, and scene understanding. In this paper, we study a class of orderless aggregation functions designed to minimize interference or equalize contributions in the context of second-order features and we show that they can be computed just as efficiently as their first-order counterparts and they have favorable properties over aggregation by summation. Another line of work has shown that matrix power normalization after aggregation can significantly improve the generalization of second-order representations. We show that matrix power normalization implicitly equalizes contributions during aggregation thus establishing a connection between matrix normalization techniques and prior work on minimizing interference. Based on the analysis we present {\gamma}-democratic aggregators that interpolate between sum ({\gamma}=1) and democratic pooling ({\gamma}=0) outperforming both on several classification tasks. Moreover, unlike power normalization, the {\gamma}-democratic aggregations can be computed in a low dimensional space by sketching that allows the use of very high-dimensional second-order features. This results in a state-of-the-art performance on several datasets

arXiv.org e-Print Archive

Crossref

Segmentation Based Interest Points and Evaluation of Unsupervised Image Segmentation Methods

Author: Koniusz P
Mikolajczyk K
Publication venue: 'British Machine Vision Association and Society for Pattern Recognition'
Publication date: 01/01/2009
Field of study

Crossref

Surrey Research Insight

COLTRANE: ConvolutiOnaL TRAjectory NEtwork for Deep Map Inference

Author: Aly Heba
Bertsimas Dimitris
Bessa Aline
Biagioni James
Chen Chen
Chen Chen
Cruz Michael O
Edelkamp Stefan
Ferreira Nivan
Fu Jiali
Guo Tao
Huang Yourong
Jang Sera
Kingma Diederik P
Koniusz Piotr
Qin Kyle K
Sasaki Yuya
Schroedl Stefan
Shao Wei
Shi Wenhuan
Simonyan Karen
Stanojevic Rade
Tas Yusuf
Worrall Stewart
Zheng Renjie
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 24/09/2019
Field of study

The process of automatic generation of a road map from GPS trajectories, called map inference, remains a challenging task to perform on a geospatial data from a variety of domains as the majority of existing studies focus on road maps in cities. Inherently, existing algorithms are not guaranteed to work on unusual geospatial sites, such as an airport tarmac, pedestrianized paths and shortcuts, or animal migration routes, etc. Moreover, deep learning has not been explored well enough for such tasks. This paper introduces COLTRANE, ConvolutiOnaL TRAjectory NEtwork, a novel deep map inference framework which operates on GPS trajectories collected in various environments. This framework includes an Iterated Trajectory Mean Shift (ITMS) module to localize road centerlines, which copes with noisy GPS data points. Convolutional Neural Network trained on our novel trajectory descriptor is then introduced into our framework to detect and accurately classify junctions for refinement of the road maps. COLTRANE yields up to 37% improvement in F1 scores over existing methods on two distinct real-world datasets: city roads and airport tarmac.Comment: BuildSys 201

arXiv.org e-Print Archive

Crossref

Higher-order occurrence pooling for bags-of-words: visual concept detection

Author: Gosselin P-H
Koniusz P
Mikolajczyk K
Yan F
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2016
Field of study

In object recognition, the Bag-of-Words model assumes: i) extraction of local descriptors from images, ii) embedding the descriptors by a coder to a given visual vocabulary space which results in mid-level features, iii) extracting statistics from mid-level features with a pooling operator that aggregates occurrences of visual words in images into signatures, which we refer to as First-order Occurrence Pooling. This paper investigates higher-order pooling that aggregates over co-occurrences of visual words. We derive Bag-of-Words with Higher-order Occurrence Pooling based on linearisation of Minor Polynomial Kernel, and extend this model to work with various pooling operators. This approach is then effectively used for fusion of various descriptor types. Moreover, we introduce Higher-order Occurrence Pooling performed directly on local image descriptors as well as a novel pooling operator that reduces the correlation in the image signatures. Finally, First-, Second-, and Third-order Occurrence Pooling are evaluated given various coders and pooling operators on several widely used benchmarks. The proposed methods are compared to other approaches such as Fisher Vector Encoding and demonstrate improved results

Spiral - Imperial College Digital Repository

Segmentation Based Interest Points and Evaluation of Unsupervised Image Segmentation Methods.

Author: Koniusz P
Mikolajczyk K
Publication venue: 'British Machine Vision Association and Society for Pattern Recognition'
Publication date: 31/10/2017
Field of study

University of Surrey

Soft assignment of visual words as Linear Coordinate Coding and optimisation of its reconstruction error

Author: Koniusz P
Mikolajczyk K
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2011
Field of study

Visual Word Uncertainty also referred to as Soft Assignment is a well established technique for representing images as histograms by flexible assignment of image descriptors to a visual vocabulary. Recently, an attention of the community dealing with the object category recognition has been drawn to Linear Coordinate Coding methods. In this work, we focus on Soft Assignment as it yields good results amidst competitive methods. We show that one can take two views on Soft Assignment: an approach derived from Gaussian Mixture Model or special case of Linear Coordinate Coding. The latter view helps us propose how to optimise smoothing factor of Soft Assignment in a way that minimises descriptor reconstruction error and maximises classification performance. In turns, this renders tedious cross-validation towards establishing this parameter unnecessary and yields it a handy technique. We demonstrate state-of-the-art performance of such optimised assignment on two image datasets and several types of descriptors

Crossref

University of Surrey

Surrey Research Insight

Protected Pooling Method of Sparse Coding in Visual Classification

Author: P. Koniusz
X. Zhou
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

Crossref

Spatial Coordinate Coding to reduce histogram representations, Dominant Angle and Colour Pyramid Match

Author: Koniusz P
Mikolajczyk K
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2011
Field of study

Spatial Pyramid Match lies at a heart of modern object category recognition systems. Once image descriptors are expressed as histograms of visual words, they are further deployed across spatial pyramid with coarse-to-fine spatial location grids. However, such representation results in extreme histogram vectors of 200K or more elements increasing computational and memory requirements. This paper investigates alternative ways of introducing spatial information during formation of histograms. Specifically, we propose to apply spatial location information at a descriptor level and refer to it as Spatial Coordinate Coding. Alternatively, x, y, radius, or angle is used to perform semi-coding. This is achieved by adding one of the spatial components at the descriptor level whilst applying Pyramid Match to another. Lastly, we demonstrate that Pyramid Match can be applied robustly to other measurements: Dominant Angle and Colour. We demonstrate state-of-the art results on two datasets with means of Soft Assignment and Sparse Coding

Crossref

University of Surrey

Surrey Research Insight

Comparison of mid-level feature coding approaches and pooling strategies in visual concept detection

Author: Koniusz P
Mikolajczyk K
Yan F
Publication venue: 'Elsevier BV'
Publication date: 24/01/2020
Field of study

Bag-of-Words lies at a heart of modern object category recognition systems. After descriptors are extracted from images, they are expressed as vectors representing visual word content, referred to as mid-level features. In this paper, we review a number of techniques for generating mid-level features, including two variants of Soft Assignment, Locality-constrained Linear Coding, and Sparse Coding. We also isolate the underlying properties that affect their performance. Moreover, we investigate various pooling methods that aggregate mid-level features into vectors representing images. Average pooling, Max-pooling, and a family of likelihood inspired pooling strategies are scrutinised. We demonstrate how both coding schemes and pooling methods interact with each other. We generalise the investigated pooling methods to account for the descriptor interdependence and introduce an intuitive concept of improved pooling. We also propose a coding-related improvement to increase its speed. Lastly, state-of-the-art performance in classification is demonstrated on Caltech101, Flower17, and ImageCLEF11 datasets. © 2012 Elsevier Inc. All rights reserved

University of Surrey

Improving few-shot learning by spatially-aware matching and crosstransformer

Author: Koniusz P
Torr PHS
Zhang H
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 09/01/2023
Field of study

Current few-shot learning models capture visual object relations in the so-called meta-learning setting under a fixed-resolution input. However, such models have a limited generalization ability under the scale and location mismatch between objects, as only few samples from target classes are provided. Therefore, the lack of a mechanism to match the scale and location between pairs of compared images leads to the performance degradation. The importance of image contents varies across coarse-to-fine scales depending on the object and its class label, e.g., generic objects and scenes rely on their global appearance while fine-grained objects rely more on their localized visual patterns. In this paper, we study the impact of scale and location mismatch in the few-shot learning scenario, and propose a novel Spatially-aware Matching (SM) scheme to effectively perform matching across multiple scales and locations, and learn image relations by giving the highest weights to the best matching pairs. The SM is trained to activate the most related locations and scales between support and query data. We apply and evaluate SM on various few-shot learning models and backbones for comprehensive evaluations. Furthermore, we leverage an auxiliary self-supervisory discriminator to train/predict the spatial- and scale-level index of feature vectors we use. Finally, we develop a novel transformer-based pipeline to exploit self- and cross-attention in a spatially-aware matching process. Our proposed design is orthogonal to the choice of backbone and/or comparator

Oxford University Research Archive